Skip to content

NFS3 protocol specific snapshot workflows#36

Merged
rajiv-jain-netapp merged 33 commits intomainfrom
feature/CSTACKEX-18_2
Mar 9, 2026
Merged

NFS3 protocol specific snapshot workflows#36
rajiv-jain-netapp merged 33 commits intomainfrom
feature/CSTACKEX-18_2

Conversation

@rajiv-jain-netapp
Copy link

@rajiv-jain-netapp rajiv-jain-netapp commented Feb 20, 2026

Description

This PR...

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • Build/CI
  • Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

@rajiv-jain-netapp
Copy link
Author

I have tested

Taking a CS-volume snapshot for the data CS-volume.
Tried to take VM-level snapshot, CloudStack asked me to select a single CS-volume instead of allowing to take a VM-level snapshot. That also worked.

@rajiv-jain-netapp
Copy link
Author

I have tested VM level and cloudstack-volume level snapshots, all found working

  1. VM level snapshot without memory-disk and quiecing option.
  2. VM level snapshot with quiecing option enabled. --> did not find working and error says, unable to quiece VM since emu agent is not found working on VM
  3. VM level snapshot with memory snapshot enabled--> working
  4. VM level snapshot with memory-snapshot and quiecing options enabled --> working fine.
  5. cloudstack volume level snapshots are working fine.


@Override
public void takeSnapshot(SnapshotInfo snapshot, AsyncCompletionCallback<CreateCmdResult> callback) {
s_logger.error("temp takeSnapshot : entered with snapshot id: " + snapshot.getId() + " and name: " + snapshot.getName());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log level should not be error plus remove temp

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was intentionally kept for testing and was mistakenly overlooked during cleanup.

Map<String, String> poolDetails = storagePoolDetailsDao.listDetailsKeyPairs(volumeVO.getPoolId());
StorageStrategy storageStrategy = Utility.getStrategyByStoragePoolDetails(poolDetails);

Map<String, String> cloudStackVolumeRequestMap = new HashMap<>();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get Volume Request is not as per protocol

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added protocol check.

cloudStackVolumeRequestMap.put(Constants.VOLUME_UUID, poolDetails.get(Constants.VOLUME_UUID));
cloudStackVolumeRequestMap.put(Constants.FILE_PATH, volumeVO.getPath());
CloudStackVolume cloudStackVolume = storageStrategy.getCloudStackVolume(cloudStackVolumeRequestMap);
if (cloudStackVolume == null || cloudStackVolume.getFile() == null) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this condition is also specific to file, it will fail for lun

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

taken care

throw new CloudRuntimeException("takeSnapshot: Failed to get source file to take snapshot");
}
long capacityBytes = storagePool.getCapacityBytes();
s_logger.error("temp takeSnapshot : entered after getting cloudstack volume with file path: " + cloudStackVolume.getFile().getPath() + " and size: " + cloudStackVolume.getFile().getSize());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log level should not be error

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

long capacityBytes = storagePool.getCapacityBytes();
s_logger.error("temp takeSnapshot : entered after getting cloudstack volume with file path: " + cloudStackVolume.getFile().getPath() + " and size: " + cloudStackVolume.getFile().getSize());
long usedBytes = getUsedBytes(storagePool);
long fileSize = cloudStackVolume.getFile().getSize();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these all changes are related to file

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed


String fileSnapshotName = volumeInfo.getName() + "-" + snapshot.getUuid();

int maxSnapshotNameLength = 64;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, we can have more character in snapshot name

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved it to contants for now, will update the length by checking it, if required.

return cloudStackVolumeRequest;
}

private CloudStackVolume snapshotCloudStackVolumeRequestByProtocol(Map<String, String> details,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method is not used anywhere

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am calling this method in takeSnapshot()

Copy link
Author

@rajiv-jain-netapp rajiv-jain-netapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reposting the PR with changes


@Override
public void takeSnapshot(SnapshotInfo snapshot, AsyncCompletionCallback<CreateCmdResult> callback) {
s_logger.error("temp takeSnapshot : entered with snapshot id: " + snapshot.getId() + " and name: " + snapshot.getName());
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was intentionally kept for testing and was mistakenly overlooked during cleanup.

Map<String, String> poolDetails = storagePoolDetailsDao.listDetailsKeyPairs(volumeVO.getPoolId());
StorageStrategy storageStrategy = Utility.getStrategyByStoragePoolDetails(poolDetails);

Map<String, String> cloudStackVolumeRequestMap = new HashMap<>();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added protocol check.

cloudStackVolumeRequestMap.put(Constants.VOLUME_UUID, poolDetails.get(Constants.VOLUME_UUID));
cloudStackVolumeRequestMap.put(Constants.FILE_PATH, volumeVO.getPath());
CloudStackVolume cloudStackVolume = storageStrategy.getCloudStackVolume(cloudStackVolumeRequestMap);
if (cloudStackVolume == null || cloudStackVolume.getFile() == null) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

taken care

throw new CloudRuntimeException("takeSnapshot: Failed to get source file to take snapshot");
}
long capacityBytes = storagePool.getCapacityBytes();
s_logger.error("temp takeSnapshot : entered after getting cloudstack volume with file path: " + cloudStackVolume.getFile().getPath() + " and size: " + cloudStackVolume.getFile().getSize());
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

long capacityBytes = storagePool.getCapacityBytes();
s_logger.error("temp takeSnapshot : entered after getting cloudstack volume with file path: " + cloudStackVolume.getFile().getPath() + " and size: " + cloudStackVolume.getFile().getSize());
long usedBytes = getUsedBytes(storagePool);
long fileSize = cloudStackVolume.getFile().getSize();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed


String fileSnapshotName = volumeInfo.getName() + "-" + snapshot.getUuid();

int maxSnapshotNameLength = 64;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved it to contants for now, will update the length by checking it, if required.

return cloudStackVolumeRequest;
}

private CloudStackVolume snapshotCloudStackVolumeRequestByProtocol(Map<String, String> details,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am calling this method in takeSnapshot()

mapCapabilities.put(DataStoreCapabilities.STORAGE_SYSTEM_SNAPSHOT.toString(), Boolean.FALSE.toString());
mapCapabilities.put(DataStoreCapabilities.CAN_CREATE_VOLUME_FROM_SNAPSHOT.toString(), Boolean.FALSE.toString());
mapCapabilities.put(DataStoreCapabilities.STORAGE_SYSTEM_SNAPSHOT.toString(), Boolean.TRUE.toString());
mapCapabilities.put(DataStoreCapabilities.CAN_CREATE_VOLUME_FROM_SNAPSHOT.toString(), Boolean.TRUE.toString());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to set this as true as we are currently not supporting volume creation from snapshot.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this property set to true, CloudStack initiates both the snapshot creation and the corresponding copy operation to secondary storage over the data path, as it assumes that snapshots must reside on secondary storage by default.
When only the storage-system-snapshot property is enabled, CloudStack successfully invokes the takeSnapshot() method in the plugin driver class. However, the snapshot remains on the primary storage pool and is not transferred to secondary storage.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, with this implementation, snapshots reside in primary storage itself?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I’m trying to explain is that if we don’t keep the second flag set to true, then only the plugin‑created snapshot will remain on the primary. In that case, the orchestrator will call the host agent again to create another snapshot and copy it to the secondary.
To avoid this unnecessary duplicate operation, I set both flags to true. This ensures that the orchestrator does not rely on the host-agent portion of the workflow, and the snapshot created by the plugin is the one that gets copied to the secondary.

return new Pair<>(snapshotXml, volumeObjectToNewPathMap);
}

protected void cleanupLeftoverDeltas(List<VolumeObjectTO> volumeObjectTos, Map<String, Pair<Long, String>> mapVolumeToSnapshotSizeAndNewVolumePath, KVMStoragePoolManager storagePoolMgr) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this refactoring somehow help our plugin functionality?

mapCapabilities.put(DataStoreCapabilities.STORAGE_SYSTEM_SNAPSHOT.toString(), Boolean.FALSE.toString());
mapCapabilities.put(DataStoreCapabilities.CAN_CREATE_VOLUME_FROM_SNAPSHOT.toString(), Boolean.FALSE.toString());
mapCapabilities.put(DataStoreCapabilities.STORAGE_SYSTEM_SNAPSHOT.toString(), Boolean.TRUE.toString());
mapCapabilities.put(DataStoreCapabilities.CAN_CREATE_VOLUME_FROM_SNAPSHOT.toString(), Boolean.TRUE.toString());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are supporting volume creation from snapshot with this PR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

volume creation usecase is not yet ready as part of this PR


@Override
public void takeSnapshot(SnapshotInfo snapshot, AsyncCompletionCallback<CreateCmdResult> callback) {
s_logger.info("takeSnapshot : entered with snapshot id: " + snapshot.getId() + " and name: " + snapshot.getName());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets have an intuitive log message instead of a method entry logger?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide a few examples?
Since I am also logging the method arguments, I felt that this approach would give me better insight into the method invocation flow, rather than relying solely on static logging.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding an entry log we can add a log post line 544. For example, it could be like
s_logger.info("takeSnapshot: Snapshot with ID: " + snapshot.getId() + " and name: " + snapshot.getName() + " requested for volume with ID: " + volumeVO.getId() + " and name: " + volumeVO.getName())
It's just an example but I feel this could give us more outlook on whats happening.

cloudStackVolumeRequestMap.put(Constants.FILE_PATH, volumeVO.getPath());
cloudStackVolume = storageStrategy.getCloudStackVolume(cloudStackVolumeRequestMap);
if (cloudStackVolume == null || cloudStackVolume.getFile() == null) {
throw new CloudRuntimeException("takeSnapshot: Failed to get source file to take snapshot");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have method names only in loggers and not have them in exceptions? Management server might show this as an error on the UI?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

acknowledged

try {
fileResponse = nasFeignClient.getFileResponse(authHeader, volumeUuid, filePath);
if (fileResponse == null || fileResponse.getRecords().isEmpty()) {
throw new CloudRuntimeException("File " + filePath + " not not found on ONTAP. " +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo "not not"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still see the same

return false;
}

if (!Hypervisor.HypervisorType.KVM.equals(userVm.getHypervisorType())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have checks in StoragePool and CS Volume creation workflows for Hypervisor type right? Do we need it here also?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inline with the snapshot workflow

return false;
}

if (!VirtualMachine.State.Running.equals(userVm.getState())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this block disk snapshots?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope


updateSnapshotDetails(snapshot.getId(), volumeInfo.getId(), poolDetails.get(Constants.VOLUME_UUID), cloneCloudStackVolume.getFile().getPath(), volumeVO.getPoolId(), fileSize);

snapshotObjectTo.setPath(Constants.ONTAP_SNAP_ID +"="+cloneCloudStackVolume.getFile().getPath());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "=" intentional ?

CloudStackVolume snapCloudStackVolumeRequest = snapshotCloudStackVolumeRequestByProtocol(poolDetails, volumeVO.getPath(), snapshotName);
CloudStackVolume cloneCloudStackVolume = storageStrategy.snapshotCloudStackVolume(snapCloudStackVolumeRequest);

updateSnapshotDetails(snapshot.getId(), volumeInfo.getId(), poolDetails.get(Constants.VOLUME_UUID), cloneCloudStackVolume.getFile().getPath(), volumeVO.getPoolId(), fileSize);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If iSCSI needs different things to be updated in DB, we might get an if-else on protocol here also.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we have to handle that with iscsi protocol.

if (volume.getPoolId() == null) {
return false;
}
StoragePoolVO pool = storagePool.findById(volume.getPoolId());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can optimize here -> Instead of making findById for each volume, we can use listByUuids by passing all volumes together. This will save much db calls and memory.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DB calls are local and not very expensive.

boolean quiescevm = true;
VMSnapshotOptions options = vmSnapshotVO.getOptions();
if (options != null && !options.needQuiesceVM()) {
logger.info("Quiesce option was set to false, but overriding to true for ONTAP managed storage " +

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are planning to only support app consistency, why not throw an error message stating ONTAP recommends quiescing the vm instead of overriding in backend without notifying user.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic has been changed to adhering the user input on quiecing

// TODO Set it to false once we start supporting snapshot feature
mapCapabilities.put(DataStoreCapabilities.STORAGE_SYSTEM_SNAPSHOT.toString(), Boolean.FALSE.toString());
mapCapabilities.put(DataStoreCapabilities.CAN_CREATE_VOLUME_FROM_SNAPSHOT.toString(), Boolean.FALSE.toString());
// STORAGE_SYSTEM_SNAPSHOT=TRUE enables StorageSystemSnapshotStrategy to handle snapshots.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove comments if applicable

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
}

/**

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these commented code was left in earlier merges, I will remove it.

<bean id="ontapPrimaryDataStoreProvider"
class="org.apache.cloudstack.storage.provider.OntapPrimaryDatastoreProvider" />

<bean id="ontapVMSnapshotStrategy"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not inject it through configure method of OntapPrimaryDatastoreProvider which will also follow cloudstack injection framework.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change is for context to be given tot he cloudstack for the classes we added and we want them to be loaded in the JVM

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that way also , it will give context to the cloudstack and will be loaded in JVM once ontap driver is loaded.

this.sanFeignClient = feignClientFactory.createClient(SANFeignClient.class, baseURL);
}

@Override

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this here again? We already have sanFeignClient in it constructor.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, corrected it now

SnapshotObjectTO snapshotObjectTo = (SnapshotObjectTO) snapshot.getTO();

// Build snapshot name using volume name and snapshot UUID
String snapshotName = buildSnapshotCloneName(volumeInfo.getName(), snapshot.getUuid());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clone ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just a method name, but corrected.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you notice, it is just a method name. But I will update this.

Copy link

@suryag1201 suryag1201 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments resolved


VolumeVO volumeVO = volumeDao.findById(volumeInfo.getId());
if (volumeVO == null) {
throw new CloudRuntimeException("takeSnapshot: VolumeVO not found for id: " + volumeInfo.getId());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't add method names in exceptions, as we don't have any cleanup framework before showing them on UI

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand. I was watching such instances, some of which are still overlooked.

*/
@Override
public void revertSnapshot(SnapshotInfo snapshotOnImageStore, SnapshotInfo snapshotOnPrimaryStore, AsyncCompletionCallback<CommandResult> callback) {
public void revertSnapshot(SnapshotInfo snapshotOnImageStore, SnapshotInfo snapshotOnPrimaryStore,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that 'CAN_CREATE_VOLUME_FROM_SNAPSHOT' in the capabilities list has been set to 'True', but createAsync doesn't have any implementation to create volume from snapshot. So, if we are not supporting creating volume from snapshot, do we need revertSnapshot implementation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We’ve made good progress on the revert while working in parallel on the snapshot design, since both need to align. Our first PR is still awaiting community review, which is giving us additional time to advance the revert work. I plan to finalize the revert workflow once we complete the upcoming planned milestone. If time permits, the goal is to include the revert changes in the next PR as well.

Copy link

@piyush5netapp piyush5netapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add testing screenshot .

@rajiv-jain-netapp rajiv-jain-netapp merged commit fccaf83 into main Mar 9, 2026
16 checks passed
@rajiv-jain-netapp rajiv-jain-netapp deleted the feature/CSTACKEX-18_2 branch March 9, 2026 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants